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Abstract 

Bayesian analysis for Markov jump processes is a non-trivial and challenging problem. 
Although exact inference is theoretically possible, it is computationally demanding thus its 
applicability is limited to a small class of problems. In this paper we describe the applica- 
tion of Riemann manifold MCMC methods using an approximation to the likelihood of the 
Markov jump process which is valid when the system modelled is near its thermodynamic 
limit. The proposed approach is both statistically and computationally efficient while the 
convergence rate and mixing of the chains allows for fast MCMC inference. The method- 
ology is evaluated using numerical simulations on two problems from chemical kinetics and 
one from systems biology. 

1 Introduction 

Markov Jump Processes (MJP) provides us with a formal description of the underlying stochastic 
behaviour of many physical systems and as such they have a wide applicability in many scientific 
fields. In chemistry and biology, for example, they are applied for modelling reactions between 
chemical species [TJ [2] . In ecology and epidemiology, they are used for modelling the popula- 
tion of interacting species in the environment [3] while in telecommunications they describe the 
population of information packets over a network pTJ. In order to introduce some terminology 
and notation we will give a more concrete example from chemical kinetics. However, the mod- 
elling methodology is similar in other applications although different assumptions are needed, 
depending on the system being modelled, for calculating reaction rates. Consider a model for 
the population of molecules of two interacting chemical species, Xa and Xb, in a solution of 
volume O, where Xa and Xb denote the number of molecules of chemicals A and B respectively. 
The interactions between the species are modelled using reactions which are specified using the 
following notation: R\ : A + B — > 2A. On the left hand side appear the reactants and on the 
right hand side the products of the reaction while over the arrow appears the rate constant c\ 
which is the probability that a randomly chosen pair of A and B will react according to R\. 
This reaction, for example, specifies that a pair of molecules A, B react with probability c\ to 
produce a new molecule of A. For calculating the probability of a reaction taking place given the 
current state of the system, i.e. the number of molecules of chemicals A and B, several system 
dependent assumptions must be made. For chemical reactions it is assumed that in a well stirred 
solution the probability of a reaction is proportional to the populations of its products [5]. For 
Ri we can write it as Ji(Xa, Xb,ci) = c\£1~ 1 XaXb- Following the same reasoning additional 
reactions and species can be added in order to construct large and complex reaction networks. 
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Together, the state of the system Xa,Xb, the set of reactions and the reaction rates specify a 
Markov Jump process where the occurrences of reactions are modelled as a Poisson process. 

For this particular example the probability of the reaction has a simple form and is linear 
with respect to the populations. However in many real applications this is often not the case 
while the rate constants, ci, are unknown. Given a fully specified MJP, i.e. a MJP with known 
parameters, rate constants and initial conditions, it is possible to perform exact simulation 
and obtain samples from the underlying stochastic process using the Stochastic Simulation 
Algorithm (SSA) of [I]. In many problems there are system parameters which are not specified 
or are unknown while it is relatively easy to collect partial observations of the physical process 
at discrete time points. The interest is therefore to obtain statistical estimates of the unknown 
parameters using the available data. 

As a consequence of the Markov property, MJPs satisfy the Chapman-Kolmogorov equation 
from which we can directly obtain the forward master equation describing the evolution of the 
system's state probability over any time interval. However, even for small and simple systems 
the master equation is intractable and it is not straightforward as to how partially and discretely 
observed data from the physical process should be incorporated in order to perform inference 
over unknown system parameters. Recently, [Bj have shown that it is possible to construct a 
Markov Chain whose stationary probability distribution is the posterior of the unknown param- 
eters without resorting to any approximations of the original MJP. Their method however is 
computationally expensive while the strong correlation between posterior samples means that a 
large number of MCMC iterations are required in order to obtain Monte Carlo estimates with 
sufficient accuracy. 

An alternative is to consider suitable approximations of the likelihood function. The system 
size expansion of [7, Chap. 10] provides a systematic method for obtaining approximations of a 
physical process approaching its thermodynamic limit. The most simple approximation yields 
the Macroscopic Rate Equation (MRE) which describes the thermodynamic limit of the system 
with a set of ordinary differential equations neglecting any random fluctuations. Although the 
MRE has been extensively studied in the literature, see for example, [HJE], it is not applicable 
for problems where information about the noise and the random fluctuations is necessary or 
the system is far from its thermodynamic limit. The diffusion approximation |10t [TTJ describes 
the physical process by a set of non-linear stochastic differential equations with state dependent 
Brownian motion. Similar to the master equation however, the likelihood is intractable. In 
|12j a transformation is applied such that the Brownian increments are independent of the 
system state and thus the system can be easily simulated. However this limits the applicability 
of the methodology into systems where such a transformation is possible. A more general 
methodology in presented in [13] where an approximation of the likelihood is used instead. 
Finally, a less studied approach for the purpose of inference is the Linear Noise Approximation 
(LNA) which conveniently decouples non-linearity in the diffusion approximation into a non- 
linear set of ordinary differential equations in the MRE and a set of linear stochastic differential 
equations for the random fluctuations around a deterministic state [7J Chap. 10], [H]. Recently, 
|15j have shown the simple analytic form of the approximate likelihood obtained by the LNA 
simplifies MCMC inference and can be applied to problems with relatively small number of 
molecules. 

A commonly employed algorithm for MCMC is the Metropolis-Hastings algorithm [16] , which 
relies on random perturbations around the current state using a local proposal mechanism. It 
should be noted here that the state of the Markov chain is different from the state of the 
stochastic process. In the MCMC context state refers the current values of the unknown system 
parameters whereas the state of the system refers to the value of the stochastic process at 
a given time. We will use the term state interchangeably for the rest of this paper and its 
meaning will be clear from the context. Due to the local nature of the proposal mechanism used 
by the Metropolis-Hastings algorithm, samples from the posterior exhibit strong random walk 
behaviour and auto-correlation. Tuning the proposal mechanisms to achieve good mixing and 
fast convergence is far from straightforward even though some theoretical guidance is provided 
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[TT] . MCMC methods, such as the Metropolis Adjusted Langevin Algorithm (MALA) [18] and 
the Hamiltonian Monte Carlo (HMC) [19], have also been studied in the literature and have 
been shown to be more efficient than random walk Metropolis-Hastings in terms of Effective 
Sample Size (ESS) and convergence rates on several problems. However, HMC and MALA also 
require extensive tuning of the proposal mechanisms, see for example [20J and [2T]. For MJPs 
the problem is compounded further since system parameters, such as probability rate constants 
of chemical reactions, are often highly correlated and whose values may differ by orders of 
magnitudes. The resulting posterior distributions have long narrow "valleys" preventing any 
local proposal mechanism from proposing large moves about the parameter space. 

More recently [22] proposed exploitation of the underlying Riemann manifold of probability 
density functions when defining MCMC methods thus exploiting the intrinsic geometry of statis- 
tical models, thereby providing a principled framework and systematic approach to the proposal 
design process. These algorithms rely on the gradient and Fisher Information matrix of the 
likelihood function to automatically tune the proposal mechanism such that large moves on the 
parameter space are possible and therefore improve convergence and mixing of the chains. In 
[U] this approach has been successfully applied for the MRE approximation of chemical reaction 
networks. For the LNA the Fisher Information and the gradient of the likelihood function can 
be easily obtained [2J. In this paper we study the application of the Riemann manifold MCMC 
methods for the LNA approximation and compare the mixing efficiency and computational cost 
with to the commonly used Metropolis-Hastings algorithm. Moreover we study how the the 
Markov chains and the resulting Monte Carlo estimates behave for systems which are far from 
their thermodynamic limit. The aim is to improve the efficiency of MCMC inference for MJPs 
in order to allow for larger and more complex models frequently encountered in biology and 
chemistry to be studied in more detail. 

In the next section we give a brief overview of Markov jump processes. The diffusion and 
linear noise approximations are presented in section [3j We then discuss MCMC and the Riemann 
manifold algorithms in section|H Numerical simulations are presented in section \6\ while section[7] 
concludes the paper. 



2 Markov Jump Processes 

A /^-dimensional stochastic process is a family of D random variables X(t) = [X\(t), . . . , Xf)(t)] T 
indexed by a continuous time variable t with initial conditions X(to) = Xf . A Markov Jump 
Process (MJP) is a stochastic process satisfying the Markov property such that 

N 

P [x(t ), . . . , x(t N )} = p[x(t )} J]p[x(ti)|A:(ti-i)], 

where the dependence on any parameters or other quantities has been suppressed. That is, the 
conditional probability of the system state at time ij only depends on state of the system at 
the previous time it— i. A MJP is characterised by a finite number, M, of state transitions with 
rates fj(x, 0, t) and state change vectors Sj = (sij, . . . , SD,j) T with j G [1, . . . , M]. fj(x, 6, t)dt 
is the probability, given the state of the system at time t, X(t) = x, of a jump to a new state 
x + Sj in the infinitesimal time interval [t,t + dt). For the problems we consider in this paper 
the transition rates not only depend on the current state and time but also on unknown rate 
parameters 9. From the Markov property we can directly obtain the conditional probability 
of the system being in state x at time t given initial conditions which is characterised by the 
master equation 

'—jj- 1 — = [fi( x ~ s v°^)p( x ~ s jA x o^o) ~ fj{x,0,t)p(x,t\x o ,t o )] . (1) 

j'=i 

Equation ([I]) in general form is intractable especially when the transition rate functions 
fj(-) are nonlinear with respect to the system state. Numerical simulation is also prohibitively 
expensive as the computational cost grows exponentially with D 
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However, given initial conditions X(to) = Xt and values for the unknown rate parameters 
we can simulate realisations of the MJP by first noting that the time r to the next state transition 
is exponentially distributed with rate A = YljLi fj( x t ,6,to) and the new state X(to + r) will 
be Xt + Sj with probability fj(xt ,0,t)/X. This results in an iterative algorithm from which 
we can forward simulate a complete trajectory for the stochastic process X(t), known as the 
Stochastic Simulation Algorithm (SSA) pQ in the chemical kinetics literature. 

From the specification of the MJP we can also write the likelihood function with respect to 
the parameters for a completely observed process X(t) at the time interval [0,T] as 

N / M 

P{X\0) = Y\fki{Xi-l,0,Ti-l)exp -Ti ^ fj>{Xi-l,0,Ti-i) 

i=l \ j'=l 

where N is the number of transitions occurred in the time interval [0, T], fq £ [1, . . . M] is the type 
of the i th transition and Tj, X{ are the time and state at the i transition respectively. Notice that 
the likelihood function corresponds to the generative process described by the SSA. By specifying 
a suitable prior and applying Bayes' theorem, we can obtain the posterior distribution p(0\ X) 
which we can use for inference over the unknown parameters [6]. 

In many problems of interest however we cannot observe the times and types of all transitions 
in a given time interval. Rather, we can only observe the state of the system X{ti) = Xi at 
discrete time points ti £ [0,T]. The solution proposed in [6] is to treat the trajectories, as well 
as the number, times and types of transitions, between observed time points as latent variables. 
This leads to a data augmentation framework [23] where a Markov Chain is constructed to 
sample from the joint posterior of the parameters and the latent variables. At each MCMC 
iteration the complete trajectory of the MJP process has to be simulated conditional on the 
observed data and the parameters which for some systems can be computationally demanding. 
Furthermore, due to the high dimensional nature of the simulated trajectory and the strong 
dependence on the system parameters and observed data the MCMC algorithm has very poor 
convergence and mixing properties requiring many samples from the posterior in order to obtain 
sufficiently accurate Monte Carlo estimates. Finally, a further complication that arises is that 
the number of transitions between two observed time points is also unknown and has to be 
sampled using a reversible-jumps type algorithm [25J. For more details see [6]. The resulting 
algorithm therefore is computationally demanding thus limiting its applicability on small and 
relatively simple MJPs. A more efficient version of the algorithm is also suggested in [6] where 
instead of simulating the trajectories between observations using the exact MJP an approximate 
proposal distribution is employed to sample trajectories which are accepted or rejected using 
the Metropolis-Hastings ratio. 



3 Diffusion and Linear Noise Approximations 

An alternative to working directly with the master equation and the original MJP is to consider 
approximations which provide for efficient simulation and possibly an easy to evaluate likelihood 
function for discretely observed data. Although the resulting posterior will also be approximate 
in nature, it can be sufficient for inferential purposes given that the system under consideration 
is near its thermodynamic limit. Here we describe the diffusion approximation and from that 
how we can arrive at the LNA. Our presentation is rather informal and follows [13] and pQ. 
For a more formal derivation the reader should refer to [7] and [11]. The requirement for these 
approximations to be consistent is the existence of a proportionality constant f2 which governs 
the size of the fluctuations such that for large the jumps will be relatively small and as both 
Q, and x tend to infinity approaching the system's thermodynamic limit then, 

fjfaorf^nfazAt), (2) 

where z = x/Q, and /<(•) are independent of For many physical processes where the fluc- 
tuations are due to the discrete nature of matter there is a natural parameter with such 
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properties. Examples of such parameters can be the system size in chemical kinetics, the capac- 
ity of a condenser in electric circuits or the mass of a particle [7J. 

3.1 Diffusion approximation 

In order to obtain a Langevin equation which closely matches the dynamics of the MJP it is 
assumed that there is an infinitesimal time interval dt which satisfies the following conditions 

fj(x t ,,e,t') a fj(x t ,0,t), Vt'e [t,t + dt),Vj e [1,M] (3) 
fj{ Xu G,t)dt^l VjG[l,M]. (4) 

The first condition constrains dt to be small enough such that the transition rate functions remain 
approximately constant. This implies that the number of transitions of type j is distributed as 
a Poisson random variable with mean fj(xt,6,t)dt and is independent from other transitions 
of type j' j. The second condition constrains dt to be large enough such that the number of 
transitions for each state is significantly larger than 1, which further implies that the Poisson 
distribution can be accurately approximated by a Gaussian distribution. It can be shown 
that we can choose dt and f2 such that both conditions can be satisfied and this generally occurs 
when the system approaches its thermodynamic limit. 

Given such a timescale, the state of the system at time t + dt can be computed by 

M 

x t+dt = x t + ^2M[f j (x t ,0,t)dt,f j (x t ,e,t)dt}s j (5) 
j=l 

where M[fi, cr 2 ] denotes a Gaussian random variate with mean \i and variance a 2 . From Equa- 
tion ([5]) we can directly obtain a Langevin equation of the form 

dx t = Sf(x t , 0, t)dt + SyJdmg[f(x t ,0,t)}dB t (6) 

where we used S to denote the matrix whose columns are the state change vectors Sj, /(•) to 
denote the vector whose elements are the transition rates fj(-), diag(i>) a function that returns 
a diagonal matrix with elements taken from the vector v and dBt an M dimensional Wiener 
process. Notice that the dimension of Xt differs from that of dBt. 

Due to the nonlinear state dependent drift and diffusion coefficients in Equation ^ the 
transition density of the stochastic process is also intractable. Therefore a data augmentation 
approach similar to the one in [6] has to be followed. However, there is no longer the need to 
sample the number, times and types of state transitions as the MJP is approximated with a 
continuous process. Moreover, the latent variables corresponding to unobserved states can now 
be efficiently simulated by an Euler-Maruyama scheme which is computationally more efficient 
than the SSA. This approach has been followed by |13j and [12] for inference over the unknown 
parameters 9 while in [27] a similar methodology has been applied on a real data from an 
auto-regulatory gene expression network. 

3.2 Linear noise approximation 

Substituting equation ^ in the Langevin equation © and dividing by f2 we get 

dz t = Sf{z t , 0, t)dt + ^S\J diag[/(* t , 9, t)]dB t (7) 

from which we can see that the fluctuations are of the order of 1/ y/Ti and in the thermodynamic 
limit (UJ reduces to the Macroscopic Rate Equation (MRE) 

lim dzt = Sf(zt,0,t)dt. 

Q— too 
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To obtain the Linear Noise Approximation (LNA) we make the assumption that for sufficiently 
large Q a solution to ([7]) will differ from the MRE by a stochastic term of order That is 

Z ' = i" + Wi (t (8) 

where 4> t are deterministic or sure variables satisfying the MRE and £ t are stochastic variables. 
Rewriting the transition rate functions using ([8]) and Taylor expand around we get 

We can now substitute (jHJ) and back into ([7]) and collect terms of 0(1) to get the expression 
for the differential of 4> which is nothing other than the MRE 

d<f> t = S~f(<f> t ,0,t)dt. (10) 

Finally, collecting remaining terms and neglecting terms of 0(l/\/0) and higher we get the 
differential of £ as 

d£ t = SJ;((f> t , 6, t)£ t dt + Sy/diag[~f(<t> t ,0,t)]dB t (11) 

where we used J j(-) to denote the Jacobian of the transition rates /(•). Equation (fTTj) char- 
acterises the fluctuations around the deterministic state <fi and its validity depends on the size 
of S7. As f2 increases the magnitude of the individual jumps Sj becomes negligible relative to 
the distance in 4> over which the non-linearity of fj(-) becomes noticeable. A measure of the 
sufficiency of LNA is the coefficient of variation, i.e. the ratio of the standard deviation to the 
mean. For a more thorough discussion on the validity of LNA the reader is referred to [23] and 
the supplementary material of |15j . 



3.3 Solution of the LNA and the approximate likelihood function 

LNA provides a convenient expression for the approximate likelihood since the MRE ()10p can be 
easily solved numerically and its computational cost is polynomial in D. Moreover, equation (jlip 
is a system of linear stochastic differential equations which has an explicit solution of the form 

£ t = *(io, t) (to + £ t^S^drngCfi^O^s^dB^j (12) 

where the integral is in the Ito sense and &(to,t) is the solution of 

d*(t , a) = SJ~ f {4> t , 0, t)*(to, s)ds, *(t , to) = I (13) 

Since the ltd integral of a deterministic function is a Gaussian random variable [28], equa- 
tion ()12|) implies that £ t has a multivariate normal distribution. To simplify further the anal- 
ysis assume that the initial condition for Zt has a multivariate normal distribution such that 
z to ~ Af(4> t , Vt ). For the rest of the paper we will assume that <j) t and Vt are known. In 
cases where the initial conditions are unknown they can be treated as additional parameters. 
Equations ((SJ [TUJ, [TTJ |T2|) and the specification of initial conditions further imply that 

* t ~W tJ n- 1 v t ) (14) 

where <j) t are solutions of the MRE and V% are solutions of 

dv t = sj f ~((i> t , e, t)v t + v t jf(<f> t , e, t)s T + sdia g [/(0 t , e, t)]s T . 

Finally, multiplying (|14[) by we get 

xt ~Ar(n(j> t ,nv t ). 
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Assume that we have observations from the stochastic process X(t) at discrete time points 
U 6 {ti, . . . , tpj}. Moreover, assume that each observation x t is obtained by a independent 
realisation of X(t). For example to obtain an observation at ii = 10 the SSA is used to simulate 
a trajectory from to to t\ and the state of the system at t\ is kept. For t% = 20 the SSA is again 
used to simulate a new trajectory from to to £2 keeping only the state of the system at £2 and 
the process continues until all necessary observations are gathered. This kind of data are very 
frequently encountered in biology where in order to obtain a single measurement the sample has 
to be "sacrificed" . This is common in data obtained using Polymerase Chain Reaction reporter 
assays [29] for example. See also [15] for an example of an inference problem with such data. 
Due to the independence between different observations and the Markov property the likelihood 
is simply 

N 

P (x\o) = l[M(x ti \n<t> ti ,nv ti ). (15) 

8=1 

In this paper we only consider observations of this kind. However the methodology is readily 
applicable when observations from a single realisation of X(t) are available. In this case the 
likelihood also has a simple form 

p{X\9) =./V[X|fifi(0),fiE(0)] 

where X = (x tl , . . . ,x tN ) T is an ND vector with all the observations, (i(0) = (4> tl , . ■ ■ , 4>n) 7 \ 
is also a ND vector with solutions of the MRE and £(0) is a ND x ND block matrix S(0) = 
{£(0)^' : i, j e [1, . . . , N]} such that 

*-{wf, (16) 

This stems from the fact that due to the Markov property and equation (j!4[) each Xf t can be 
written as a sum of multivariate normal random variables and therefore X is also a multivariate 
normal random variable. For more details refer to the supplementary material of [15j and 
[2]. The only additional complication which arises for time-series data is that the off-diagonal 
components of the LNA variance in equation (|16|) need to be estimated by numerically solving 
the system of ODEs in equation (|13p . Notice that despite the fact that the variance matrix is full 
we can still exploit the Markov property and write the likelihood as a product of the conditional 
likelihoods and therefore avoid the cost of inverting the ND x ND variance matrix. 

4 Markov Chain Monte Carlo Methods 

In this section we give a brief overview of the MCMC algorithms that we consider in this work. 
Some familiarity with the concepts of MCMC is required by the reader since an introduction to 
the subject is out of the scope of this paper. 

4.1 Metropolis-Hastings 

For a random vector 6 M. D with density p(0) the Metropolis-Hastings algorithm employs a 
proposal mechanism q(9*\0 t ^ 1 ) and proposed moves are accepted with probability 

min{l,p(0*)g(0*- 1 |0*)/^(0*- 1 )g(0*|0 t - 1 )} 

. In the context of Bayesian inference the target density p(0) corresponds to the posterior dis- 
tribution of the model parameters. Tuning the Metropolis-Hastings algorithm involves selecting 
the right proposal mechanism. A common choice is to use a random walk Gaussian proposal of 
the form q(6*\6 t ^ 1 ) = A/"(0*|0* _1 , £), where Af(-\fi, S) denotes the multivariate normal density 
with mean and covariance matrix S. 

Selecting the covariance matrix however, is far from trivial in most cases since knowledge 
about the target density is required. Therefore a more simplified proposal mechanism is often 



7 



considered where the covariance matrix is replaced with a diagonal matrix such as X = el where 
the value of the scale parameter e has to be tuned in order to achieve fast convergence and good 
mixing. Small values of e imply small transitions and result in high acceptance rates while the 
mixing of the Markov Chain is poor. Large values on the other hand, allow for large transitions 
but they result in most of the samples being rejected. 

Tuning the scale parameter becomes even more difficult in problems where the standard 
deviations of the marginal posteriors differ substantially, since different scales are required for 
each dimension, and this is exacerbated when correlations between different variables exist. 
Adaptive schemes for the Metropolis-Hastings algorithm have also been proposed [30] though 
they should be applied with care [31] . Parameters such as reaction rate constants often differ 
orders of magnitude, thus a scaled diagonal covariance matrix will be a bad choice for such 
problems. In the numerical simulations in the next section we used a Metropolis within Gibbs 
scheme where each parameter is updated conditional on all others using a univariate normal 
density with a parameter-specific scale parameter. This allows us to tune the scale for each 
proposal independently and achieve better mixing. 



4.2 Manifold Metropolis Adjusted Langevin Algorithm 

Denoting the log of the target density as C(6) = logp(0), the manifold MALA (MMALA) 
method, [22], defines a Langevin diffusion with stationary distribution p(6) on the Riemann man- 
ifold of density functions with metric tensor G(0). By employing a first order Euler integrator to 
solve the diffusion a proposal mechanism with density q(6*\9 t ^ 1 ) = M {6* \^i{6 t ~ l , e), e 2 G~ 1 (9 t ~ 1 )) 
is obtained, where e is the integration step size, a parameter which needs to be tuned, and the 
dth component of the mean function e)d is 

2 D D 

»(e,e) d = d + i (G- 1 (0)V 9 £(0)) r e 2 J^G(9) ij 1 r; j (17) 

i=i j=i 

where • are the Christoffel symbols of the metric in local coordinates [32] , 

Similarly to MALA [IS], due to the discretisation error introduced by the first order ap- 
proximation, convergence to the stationary distribution is not guaranteed anymore and thus 
the Metropolis-Hastings ratio is employed to correct this bias. The MMALA algorithm can be 
simply stated as in Algorithm [1] and more details can be found in [22]. 

Algorithm 1 MMALA 



1: Inititialise 6° 

2: for t = 1 to T do 

3: 9* ~ Af(0|/i(0*-\ e), e 2 G~ 1 {e t - x )) 

4: r = min{l,p(0*)g(0*- 1 |0*)/p(0*- 1 )g(r|0 i - 1 )} 

5: u ~ ZY[ ,1] 

6: if r > u then 
7: 0* = 6* 

8: else 

9: e t = e 1 - 1 

10: end if 

11: end for 



We can interpret the proposal mechanism of MMALA as a local Gaussian approximation 
to the target density similar to the adaptive Metropolis-Hastings of [33 1. In contrast to [33] . 
the effective covariance matrix in MMALA is the inverse of the metric tensor evaluated at the 
current position and no samples from the chain are required in order to estimate it, therefore 
avoiding the difficulties of adaptive MCMC discussed in [31] . Furthermore a simplified version of 
the MMALA algorithm (SMMALA) can also be derived by assuming a manifold with constant 
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curvature, thus cancelling the last term in Equation (|17j) which depends on the Christoffel 
symbols. Finally, the MMALA algorithm can be seen as a generalisation of the original MALA 
|18| since, if the metric tensor G{9) is equal to the identity matrix corresponding to an Euclidean 
manifold, then the original algorithm is recovered. 



4.3 Manifold Hamiltonian Monte Carlo 

The Riemann manifold Hamiltonian Monte Carlo (RMHMC) method defines a Hamiltonian on 
the Riemann manifold of probability density functions by introducing the auxiliary variables 
p ~ A/"(0, G(6)), which are interpreted as the momentum at a particular position and by 
considering the negative log of the target density as a potential function. More formally, the 
Hamiltonian defined on the Riemann manifold is: 

H(0,p) = -C{6) + ilog(27r|G(0)|) + ^ P T G{6)- l p (18) 

where the terms -£(0) + \ log (2tt|G(0)|) and \p T G{0)- p are the potential energy and kinetic 
energy terms, respectively. Simulating the Hamiltonian requires a time-reversible and volume 
preserving numerical integrator. For this purpose the Generalised Leapfrog algorithm can be 
employed and provides a deterministic proposal mechanism for simulating from the conditional 
distribution, i.e. 0*\p ~ p(0*\p). More details about the Generalised Leapfrog integrator can 
be found in [22] . To simulate a path across the manifold, the Leapfrog integrator is iterated 
L times which along with the integration step size e are parameters requiring tuning. Again, 
due to the integration errors on simulating the Hamiltonian, in order to ensure convergence 
to the stationary distribution the Metropolis-Hastings ratio is applied. Moreover, following the 
suggestion in |20] the number of Leapfrog iterations L is randomised in order to improve mixing. 
The RMHMC algorithm is given in Algorithm [2j 



Algorithm 2 RMHMC 



Inititialise 0° 
for t = 1 to T do 

pS-a^pIcg^- 1 )) 
el = e 1 - 1 

e ~ U[ Qt i] 
N = ceil(eL) 

{Simulate the Hamiltonian using a generalised Leapfrog integrator for N steps} 
for n = to N do 



9: 

10: 

11: 
12: 

13 
14 
15 
16 
17 
IS 
19 
20 



solves' 2 =p?-%V g H [ 0%,p* 
solve 0™ +1 = 0™ + 4 

P* 



V p H[O n „pT* 



+ V V H 0™ +1 



,P* 



1 n +2 
,P* 



end for 

(0*,P*) = 
{Metropolis-Hastings ratio} 
r = min{l,exp (-H(0*,p*) + F(0^ 1 ,p^ 1 )) } 

u ~ ^[0,1] 
if r > u then 
0* = 0* 

else 

0* = 0* 1 
end if 
end for 
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Similar to the MMALA algorithm, when the metric tensor G(9) is equal to the identity matrix 
corresponding to an Euclidean manifold, then RMHMC is equivalent to the HMC algorithm of 



5 Implementation details 



5.1 Gradient and metric tensor for the LNA 

For the manifold MCMC algorithms discussed in this section we will need the gradient of the 
log likelihood as well as a metric tensor for the LNA. For density functions the natural metric 
tensor is the expected Fisher Information, 1(9), [34j and for a multivariate normal with mean 
fJ>(9) and covariance matrix E(0) its general form is 



1(9) 



(9 



dE(fl) 



V-\9) 



dE(fl) 



For the likelihood in equation (|15p the Fisher Information is then a sum of N matrices 1(9, t), 
one evaluated at each time point. Similarly the general form of the partial derivatives for the 
log of a multivariate normal is 



dlnA/>|/x(6>),£(6>)] 



2 



(cc T -^-\9)) 



dE(fl) 



+ cr 



,dn(9) 



where c = T;~ 1 (9)[x - fi(9)]. 

Moreover, during the leap-frog integration for the RMHMC and for the mean function of 
MMALA the partial derivatives of the Fisher Information are needed. Their general form is 



dl(9) 



de k 



d 2 y(9) q 

d6ide k 

1 

2 
1 



-a, + a 



a, 



,dE(0) 



-a, 



+ -Tr 



Tr [A k (AiAj + AjAi)] 
dS(0 



9) 



gE(g) 

dOidO k j + dOjdOk ' 



-lgg(g) 



where = S -1 dl ^f^ and Aj = E" 

The above quantities require first and second order sensitivities for the cf> and V which we 
obtain by augmenting the ODE systems with the additional sensitivity equations. For an ODE 
system of n y equations with form y = F(y, t, 9), y(to) = y$(9) and ng parameters 9, the first 
and second order forward sensitivity equations are given by (|19p and (|20p respectively. 



d9~ v d9 + 9 ' 



dy(t ) dy 



09 



89 



(19) 



d 2 y 
89d9 T 



+ 



dy 

d 2 y(t ) _ 8 2 y 



+ 



dy T 
89 



F ^-+F a 

' 89 y 



89 2 



89 2 



(20) 



We use Fq to denote the n y x rig matrix where its j column is the partial derivatives of F 
with respect to 8j. Fg y denotes the derivative of Fq with respect to y and is an rig ■ n y x n y 
matrix where its j th column is the partial derivatives of vec(Fg) with respect to yj. I ny denotes 
the n y x n y Identity matrix, ® the Kronecker product and vec(A) an operator that creates a 
column vector by stacking the columns of matrix A. 
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5.2 Re-parameterisation 



In many problems the parameters 9 can be constrained in certain parts of W 10 where uq is the 
number of parameters. In models of chemical kinetics for example, rate parameters must be 
positive and can differ by orders of magnitude. For the MCMC algorithms described in the 
previous section we will need a re-parameterisation in order to allow the algorithms to operate 
on an unbounded and unconstrained parameter space. 

For the numerical simulations in section [6] we use a log 10 re-parameterisation by introducing 
the variables 9 P = log 10 (# p ), p £ [1, . . . , rig]. To ensure that we sample from the correct posterior 
the joint density is scaled by the determinant of the Jacobian such that p(X\9)p(9)\J (9)\ where 
J(9) is a rig x rig diagonal matrix with elements J(9) P:P = W Sp log(10). 

The gradient and Fisher information along with its partial derivatives follow from the chain 
rule as 

V d £(0) = V e £(0)J(0) 
1(9) = J(9) T I(9)J(9) 

m = 2J(9f m m + J(9f d ^J(9) d -^ 

89 p y ' y 1 d6 p K ' d9 p v J d6 p 



5.3 Choice of priors 

In Bayesian statistics priors provide the means for incorporating existing knowledge for the pa- 
rameters in question. The choice of a suitable prior distribution can be informed from knowledge 
about the process being modelled, the experimental design and empirical observations. For ex- 
ample we might want to restrict rate parameters in chemical kinetics from becoming very high 
since we assume from the experimental design that reactions are slow enough to be able to be 
observed. In some cases the model itself can also guide the choice of the prior. For example 
when a model is only defined for a certain range of values of the parameters, a prior restricting 
the parameters in that range should be used. 

In the numerical simulations of the next section we use independent normal priors for the 
parameters 9. Due to the re-parameterisation introduced earlier, this corresponds to a log- 
normal prior with base 10 for the parameters 9. This choice allows parameters to differ several 
orders of magnitude while it ensures they are strictly positive. Moreover, as noted in [22] the 
negative Hessian of the prior is added to the Fisher information in order to form the metric 
tensor used during MCMC sampling. This has the added benefit of regularising the Fisher 
information when it is near-singular [9] although we have not observed such problems in the 
simulations presented here. 



6 Numerical Simulations 

6.1 Chemical kinetics 

In this section we consider two examples from chemical kinetics [H] and study the effect of 
the system size parameter on inference using MCMC. The first system consists of three species 
where an unstable monomer, Si, can dimerise to an unstable dimer, S 2 , which is then converted 
to a stable form, S3. The reaction set for this system is 



m 


:S l 







R2 : 


2S 1 




s 2 


R3 


:S 2 




2S_ 


R4 


:S 2 


— ^> 


S3 
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min. ESS vs. f2 



Q 


M.H. 


SMMALA 


RMHMC 


1 


121 (3.6) 


150 (3.9) 


245 (0.06) 


2 


226 (6.7) 


2163 (57.2) 


4775 (1.3) 


5 


132 (3.9) 


3539 (93.6) 


4618 (1.2) 


10 


180 (5.3) 


3397 (89.8) 


5954 (1.6) 


100 


214 (6.4) 


3725 (98.5) 


6066 (1.7) 



Table 1: Comparison of minimum Effective Sample Size (ESS) and time normalised min. ESS for 
different values of the system size parameter of the decay dimerisation reaction model. Time 
normalised ESS is given in parenthesis. Results are calculated from 10,000 posterior samples. 

and the state of the system at time t will be denoted by X(t) = [Si(t), S2(t), Ss(t)] T . The 
propensity functions, or state transition probabilities are f(X, 6) = \c\S\{t), C2&~ 1 Si(t)(Si(t) — 
l)/2, csS2(t), C4S3(t)] T and the corresponding state change matrix is 

/ -1 -2 2 \ 
S= 1 -1 -1 . (21) 
\ 1 / 

For our experiments we will assume that initial conditions are known and set them to Si (to) = 
5fi, 5*2 (to) = S3 (to) =0, to = 0. Moreover we will set the reaction rate parameters to c\ = 1, 
62 = 2Q , C3 = 0.5 and C4 = 0.04. Notice that we make explicit the relation between the 
system size and parameter 62 and we will infer rate C2 up to a proportionality constant. For 
all the experiments we simulate data using the SSA of [1] for the time interval t £ [0, 10] and 
we discretise such that U — tj_i = 0.1. Each observation X(tj) is obtained independently by 
simulating a trajectory from to to tj and keeping only the last state discarding the rest of 
the trajectory. Moreover for each time point tj we also simulate 10 independent observations. 
Since each observation is obtained by a different trajectory of the MJP we assume that initial 
conditions do not have a point mass rather for each trajectory we sample its initial condition 
from a Poisson with means 5*1(0), 5*2(0), 53(0). 

We use the synthetic data to perform inference for the rate parameters 6 = (ci, 62, C3, Ci) T 
by drawing samples from the posterior 

jv 10 

P {o\x) oc p(e) 11 11 Af[x r (t0|sty(t0, nvfe)] 

i=l r=l 

where r indexes independent observations for the same time point. For all simulations in this pa- 
per we assume that the means for the initial conditions are known. Following similar arguments 
as for the derivation of the LNA in Section [3l namely that as the system approaches its thermo- 
dynamic limit transition densities become Gaussian, the initial conditions for the ODE systems 
for the mean and variance of the transition densities are 0(0) = -X"(0)f2 -1 and V(0) = /, where 
J is the identity matrix. In a more realistic scenario the initial conditions must be included as 
additional parameters in 0. For all parameters we used an independent log- normal prior with 
base 10, zero mean and one standard deviation and chains are initialised by drawing a random 
sample from the prior. For the Metropolis-Hastings sampler we set the initial proposal scale 
parameters to ~ le -6 and automatically adapt them every 100 samples during the burn-in phase 
in order to achieve an acceptance rate of 25% — 30% [17] . The same adaptation strategy was 
followed for the simplified MMALA and RMHMC algorithms where the initial step size was also 
set to « le~ 6 and was tuned in order to achieve acceptance rates in the order of 70 — 80% [22] . 
Finally, the number of leap-frog steps for RMHMC was fixed to 5. We have found that a burn in 
period of 10,000 to 20,000 samples was adequate for all algorithms to converge to the stationary 
distribution. 
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Posterior mean and SD. vs. f2 



u 


Cl 


C2 


C3 


c 4 


True 


1 




0.5 


0.04 


1 


0.88 (0.031) 


1.72 (0.253) 


0.39 (0.039) 


0.003 (0.002) 


2 


1.3 (0.041) 


0.69 (0.066) 


0.35 (0.016) 


0.014 (0.002) 


5 


0.93 (0.019) 


0.39 (0.028) 


0.48 (0.025) 


0.034 (0.002) 


10 


1.0 (0.015) 


0.18 (0.008) 


0.47 (0.015) 


0.037 (0.001) 


100 


0.99 (0.004) 


0.01 (0.0002) 


0.52 (0.004) 


0.039 (0.0003) 



Table 2: Marginal posterior means and standard deviations calculated from the RMHMC chain 
for different values of the system size parameter of the decay-dimerisation reaction model. 
Notice that C2 parameter is proportional to Q. Results are calculated from 10,000 posterior 
samples. 



— « — Q = 5 

Q = 1 

Q = 100 



0.4 0.45 0.5 0.55 0.6 0.03 0.035 0.04 

c 3 c 4 

Figure 1: Marginal posteriors for parameters C3 (left panel) and C4 right panel for different values 
of Q. Results are obtained by 10,000 posterior samples using RMHMC. 

Table Q] compares the minimum Effective Sample Size (ESS) and the time normalised ESS 
obtained by all algorithms for different values of the system size parameter 0. The SMMALA and 
RMHMC samplers utilise the gradients and the Fisher Information of the approximate likelihood 
obtained by the LNA in order to make efficient proposals. As the system size increases and thus 
the LNA better approximates the true likelihood then mixing of the manifold MCMC algorithms 
improves. For this particular example we can see that good mixing can be achieved even for very 
small systems with only ~ 25 molecules, (0 = 5). The M.H. sampler is not affected by the system 
size but its mixing is very poor in all cases. From the time normalised ESS we can also see that 
despite the improved mixing of RMHMC the computational cost is significant. On the contrary 
SMMALA provides a good tradeoff between mixing efficiency and computational cost. Finally, 
Table [2] reports the marginal posterior means and standard deviations for different values of Q 
obtained by RMHMC. The marginal posteriors for parameters C3 and C4 with >= 5 are also 
shown in Figure [TJ Results from the MH and SMMALA samplers are similar and are omitted. 
For small system sizes we can observe that there is an increased bias of the Monte Carlo estimate 
while the posterior standard deviation is higher reflecting the high degree of uncertainty around 
the mean. The bias however significantly reduces as the system size increases and for >= 5 
reasonable estimates can be obtained. 

The second example from the chemical kinetics literature that we consider is the Schlogl 
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Figure 2: Simulated time point data using SSA for the Schlogl reaction set and LNA predictions. 
Dots correspond to simulated data. The bold and dashed red lines correspond to the LNA 
prediction for the means and standard deviations using the true parameters. Doted blue lines 
correspond the LNA predictions using the posterior means for the rate parameters. (Online 
version in colour.) 



■1, 1, 



■i0 



(22) 



f(x,e) = 



c 3 n, 
V C4S1 



1), 



2), 



(23) 



reaction set. 



Rl : 2Si 


ciO" 1 


3S 


R2 : 35i 


c 2 n-\ 


2S 


R3 : 




5*i 


R4 : Si 




0. 



The corresponding state transition rates and state change matrix are given in equations (|22p 
and (|23p respectively. The state of the system consists only of the number of molecules of a 
single species X(t) = Si(t). The system is known to have two stable states which appear at 
different times depending on the size of the system. [T5] have shown that the LNA fails to 
provide a reasonable approximation of this system even for large concentration numbers. Their 
numerical experiments demonstrate that the LNA approximation can only approximate one of 
the two modes depending on the initial conditions. Here our aim is to show that using the 
LNA to obtain an approximate posterior over the unknown reaction rate constants can be very 
misleading for bi-stable systems. Using the resulting posterior means for the reaction rates gives 
us an LNA that fails to approximate any of the two stable modes. 

To demonstrate that we follow the same experimental procedure as in the previous example. 
That is, we simulate data using the SSA for the time interval ti G [0, 10], ti — ti-\ = 0.1 with 
fixed rate parameters and then use this data for posterior inference of the rate parameters using 
MCMC. Values for the true rate parameters and initial conditions where set as in |14| . Namely, 
d = 0.003, c 2 = 0.0001, c 3 = 200, c 4 = 3.5 and X(t ) = 280ft, where ft was fixed to I. After 
10,000 burn-in samples all samplers converged to a posterior distributions with mean 

E p (0|x)[0] ~ (0.130, 3.3e~ 4 ,3.5e +3 ,26.22) T 

and variance 

var p(0 |x)[0] ~ (1.2e- 4 ,8.2e~ 10 ,8.6e +4 ,4.53) T 
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Figure 3: Schematic representation of the auto-regulatory gene expression model with a negative 
feedback loop. A gene is transcribed into mRNA which is translated to a protein that suppresses 
gene transcription. (Online version in colour.) 

The LNA obtained by using the posterior means for the rate constants is shown in Figure [2] along 
with the data obtained by the SSA and the LNA using the true values for the rate constants. 
We can see that the LNA approximation obtained by the posterior means fails to approximate 
any of the two modes. Rather it approximates the empirical mean and variance of the data. 



6.2 Single gene expression 

Finally, to illustrate the applicability of the methodology to systems biology we also consider 
a simplified model for the biochemical reactions involved in the expression of a single gene to 
protein. The model presented in this section is the same with the model used in the study of 
|15] and we adopt the same notation in order to make comparisons easier. Gene expression 
is modelled in terms of three biochemical species; DNA, mRNA and protein; and four chem- 
ical reactions or state transitions; transcription, mRNA degradation, translation and protein 
degradation. The model can be written in chemical reaction notation as 



: DNA 


M*) 


DNA + R 


R2 : R 







R3 : R 


k P 


R + P 


RA : P 


7P^ 


0. 



The system state at time t is X(t) = [R(t),P(t)] T where R(t) and P(t) are the number of 
mRNA and protein molecules respectively. The corresponding state dependent transition rates 
are f(X,t) = [k R {t)^ R R(t),kpR{t),^pP(t)] T where jp,kp and 7p are unknown reaction rate 
constants. kp(t) is the time dependent transcription rate of the gene which for the purposes of 
this section is modelled as 

k R (t) = 6 exp(-6i(t - b 2 f) + b 3 

where all the 6,s are also unknown parameters controlling gene transcription. This corresponds 
to a transcription rate that due to some stimulus (experimental or environmental) increases for 
t < 62 and then it drops towards the base line 63 for t > 62. Finally, the state change matrix for 
this set of reactions is given in equation (|24p . 

As in the study of |15] we also consider a non-linear extension of this model where the 
transcription rate of the gene kp(t) is a function of the protein concentration that the gene is 
transcribed to. This is modelled using a Hill function 

k R (t,P) = k R (t)/(l + (P/H) n «) 
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Figure 4: Data simulated from the single gene expression model using SSA. Figures (a), (b), 
for the linear model and Figures (c), (d), for the auto-regulatory model. Dots correspond to 10 
independent draws for each time point. The bold line is the mean predicted by LNA with the 
true model parameters and the dashed lines are the H — 2x standard deviation predicted by 
LNA. Left column shows the mRNA molecules and right column the protein. (Online version 
in colour.) 

where for the experiments of this section we will set H = 63/cp / '(27_r7p) an d flu = 1/2 making 
the protein an inhibitor of mRNA transcription. A schematic representation of this model is 
shown in Figure For the rest of this section we will refer to this model as the auto-regulatory 
single gene expression model. 

Using the transition probabilities f(X,t) and matrix S we simulate synthetic data using 
the Stochastic Simulation Algorithm (SSA) [1] and sample at discrete time points. Values for 
the unknown rate constants and the parameters controlling gene transcription are shown in 
Table [3 The time interval is taken to be tj 6 [0, 25] while the interval between two observations 
ti — ti-\ = 0.25. Each time point is sampled from an independent trajectory by starting the 
SSA from to an d simulate up to t{ keeping only the state X(ti) and discarding the rest of the 
trajectory. This resembles the experimental conditions often encountered in biology where in 
order to make an observation the sample has to be "sacrificed" . Finally for each time point we 
also generate 10 independent observations from different trajectories. Initial conditions X(to) 
are simulated from a Poisson distribution with means 63/7/2 and 63/cp / '(7_r7p) f° r the mRNA 
and protein molecules respectively. The system size parameter $7 is considered to be unknown 
and for this experiment is set to 1 such that concentrations are equal to the number of molecules. 
Figures ([Ha) and (jHb) show data simulated from this process from the singe gene expression 
model as well as the LNA prediction. Simulated data for the auto-regulatory model are presented 
in Figures (JHc) and (|Ud). 

We use the simulated data to infer the unknown parameters 9 = (7p, A;p, 7p, 6o, &1) ^2, &5) T 
by sampling using MCMC from the LNA approximate posterior 

N R 

P {9\X) oc p(X\0)p(6) = p(0) H H WXrWMU), V{ti)\ 

i=l r=l 
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Figure 5: Marginal joint posterior for parameters jp,kp left panel, Figure (a), and 7^,63 right 
panel, Figure (b) for the single gene expression model. Dashed lines are the true values used 
to generate the synthetic data. Dots are samples from the posterior. Iso-contours and shaded 
region are obtained by kernel density estimation using posterior samples. (Online version in 
colour.) 



where r indexes independent samples for the same time point and R = 10. 

Table [3] summarises the results from the MCMC chains for the two models of gene expression. 
Firstly, we can see that despite the relatively small number of molecules in both systems the 
LNA approximation provides very accurate estimates for the true parameters. Moreover we 
can see that the mixing of the Metropolis-Hastings sampler is very poor for both models while 
RMHMC and simplified Manifold MALA algorithms perform very well. This can be explained 
by the strong correlations between parameters in the posterior distribution preventing the M.H. 
sampler to make sufficiently large proposals. For example, the parameters kp, 7p control mRNA 
translation and protein degradation respectively. The concentration of protein molecules is 
directly affected by the two rates and they are expected to be heavily correlated. In Figures [5j a 
Ob we show the marginal joint posterior for parameters kp,jp and 7^,63 for the single gene 
expression model which exhibit very strong positive correlation. Finally figure compares the 
trace plots obtained from MH, SMMALA and RMHMC for parameters 7p and kp of the auto- 
regulatory gene expression model. 



7 Conclusions and Future Work 

Bayesian inference for Markov jump processes is a challenging problem which has many im- 
portant practical applications. Previous research [6] has shown that although exact inference 
is possible, the computational cost and the autocorrelation of the Markov chains is such that 
limits its applicability to small problems. The main problem stems from the requirement to 
simulate the MJP for the trajectory of the system between discrete observations. [13] has shown 
that by considering a diffusion approximation the simulation can be performed in a much more 
efficient manner. In this paper we considered the linear noise approximation which only requires 
to simulate a system of ordinary differential equations while the stochastic fluctuations have 
an exact analytic solution. The linear noise approximation is valid only when the system is 
sufficiently close to its thermodynamic limit, a condition that is also required for the diffusion 
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Single g 


sne expression model. 






Parameters 


f -ft. 


IP 


kp 


bo 


hi 






True values 


n /i/i 

U.44 


0.52 


10.0 


15.0 


0.40 


7 n 
( .u 


o.u 






Metrop olis-Hastings 








(A.R.) 


(0.28) 


(0.33) 


(0.30) 


(0.34) 


(0.29) 


(0.29) 


(0.34) 


(e) 


(0.013) 


(0.007) 


(0.008) 


(0.022) 


(0.056) 


(0.007) 


(0.016) 


Mean 


0.45 


0.54 


10.54 


14.86 


0.39 


7.03 


3.14 


S.D. 


0.017 


0.017 


0.336 


0.509 


0.029 


0.056 


0.149 


ESS 


42 


34 


34 


149 


117 


58 


44 


ESS/time 


1.42 


1.15 


1 15 


5.05 


3.96 


1.96 


1.49 


Simplified Manifold MALA (A.R.= 0.79, e = 1.05 ) 


Mean 


0.45 


0.54 


10.57 


14.88 


0.39 


7.04 


3.17 




0.018 


0.016 


0.306 


0.537 


0.030 


0.053 


1 59 


ESS 


2891 


2911 


2958 


2787 


3310 


3183 


2878 


ESS/time 


83.79 


84.37 


85.73 


80.78 


95.94 


92.26 


83.42 




Manifold HMC (A.R.= 


0.84, e = 


0.91, L=5 ) 






Mean 


0.46 


0.54 


10.57 


14.95 


0.39 


7.04 


3.18 


O.LJ . 


0.018 


0.015 


0.300 


0.555 


0.030 


0.052 


n i ^ 

u. loo 


ESS 


7731 


8238 


8304 


7160 


7380 


7791 


7950 


ESS/time 


0.52 


0.55 


0.56 


0.48 


0.49 


0.52 


0.53 




Auto- 


regulatory 


single g 


ene expression model. 








Metrop olis-Hastings 








(A.R.) 


(0.26) 


(0.36) 


(0.31) 


(0.33) 


(0.24) 


(0.30) 


(0.35) 


(e) 


(0.028) 


(0.012) 


(0.016) 


(0.071) 


(0.231) 


(0.019) 


(0.029) 


Mean 


0.4360 


0.52 


10.40 


14.61 


0.40 


6.82 


3.13 


S.D. 


0.016 


0.018 


0.424 


1.089 


0.076 


0.090 


0.142 


ESS 


201 


71 


73 


465 


339 


420 


239 


ESS/time 


6.12 


2.16 


2.22 


14.17 


10.33 


12.80 


7.28 




Simplified Manifold MALA (A.R .= 


0.71, e = 1.17) 




Mean 


0.43 


0.52 


10.44 


14.24 


0.38 


6.82 


3.12 


S.D. 


0.016 


0.018 


0.422 


1.125 


0.075 


0.091 


0.142 


ESS 


2990 


3270 


3454 


3124 


3164 


3316 


3195 


ESS/time 


76.86 


84.06 


88.79 


80.30 


81.33 


85.24 


82.13 




Manifold HMC (A.R .= 


0.82, e = 


0.91, L=5 ) 






Mean 


0.43 


0.52 


10.43 


14.52 


0.40 


6.82 


3.13 


S.D. 


0.016 


0.017 


0.412 


1.158 


0.078 


0.089 


0.144 


ESS 


6532 


6593 


6614 


5112 


5384 


6595 


6642 


ESS/time 


0.41 


0.41 


0.41 


0.32 


0.34 


0.41 


0.42 



Table 3: Marginal posterior means and standard deviations for the parameters of the single gene 
expression model using simulated data. The ESS is calculated for chains of 10,000 samples after 
a burn-in period of 10,000 iterations with initial parameters randomly sampled from the prior. 
Average acceptance rate (A.R.) and sampler parameters are shown in parenthesis. Notice that 
for the Metropolis-Hastings sampler a different proposal is used for each parameter. The prior 
for all parameters was log 10 A/"(0.0, 2.0). 
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Figure 6: Example trace plots from the auto-regulatory gene expression model for parameters 
7p and hp. Red solid line denotes the true values. (Online version in colour.) 



approximation. Previous research on the linear noise approximation |15| has focussed on the 
Metropolis-Hastings sampler. We have demonstrated here that when the posterior distribution 
exhibits strong correlation between parameters then the Metropolis-Hastings sampler has strong 
auto-correlations. Such correlations are very common for chemical reaction and gene regulatory 
systems. The Riemann manifold MCMC algorithms we considered in this work exploit the ge- 
ometric structure of the target posterior in order to design efficient proposal mechanisms. In 
particular the simplified Manifold MALA algorithm is a conceptually simple algorithm which 
provides a good trade-off between computational cost and sample auto-correlation. 

Although the problems considered in this work are relatively small, but certainly non-trivial, 
we believe that the proposed methodology is applicable for larger and more complex systems. 
The systems we studied in this paper all have a linear dependence on the unknown parameters 
and we have not observed any local modes in our simulations. The analysis of such systems is 
the subject of on-going work. Moreover, in real applications it is not possible to observe the 
populations of all species and there is an additional measurement error term. Extension of the 
LNA to handle such cases is straight forward, see |15| for example, however the effect of partial 
observations and measurement error on the MCMC inference is something that needs to be 
studied in more detail. 
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